Project Introduction to Neural Networks

Andres Delgadillo

1 Project: Bank Churn Prediction

1.1 Objective

Given a Bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months

1.2 Data Dictionary

1.3 Questions to be answered

2 Import packages and turnoff warnings

3 Import dataset and quality of data

This first assessment of the dataset shows:

4 Exploratory Data Analysis

4.1 Pandas profiling report

We can get a first statistical and descriptive analysis using pandas_profiling

Pandas Profiling report is showing some warnings/characteristics in the data:

4.2 Univariate Analysis

4.3 Pairplot.

We are going to perform bivariate analysis to understand the relationship between the columns

4.4 Bivariate and Multivariate Analysis

4.4.1 Exited and Continuous features

Observations

4.4.2 Exited and Geography

4.4.3 Exited and Gender

4.4.4 Exited and NumOfProducts

4.4.5 Exited and HasCrCard

4.4.6 Exited and IsActiveMember

5 Data Pre-Processing

5.1 Feature Engineering

5.2 Data Preparation for Modeling

5.2.1 Training, validation and test sets

5.2.2 Creating Dummy Variables

5.3 Scaling Data

6 Models evaluation criteria

6.1 Insights:

6.1.1 Model can make wrong predictions as:

  1. Predicting a customer leaves the bank but actually the customer keeps the accounts
  2. Predicting a customer does not leave the bank but actually the customer closes the accounts

6.1.2 Which case is more important?

6.1.3 How to reduce this loss?

6.2 Functions to evaluate models

7 Model Building - Neural Network

7.1 Creating the model

Now, we are going to create the Neural Network using Keras

We are going to compile the model using:

Summary of the model

7.2 Training (Forward and Backpropagation)

Now, we are going to train the model using 100 epochs

Observation

7.3 ROC-AUC

Now, we are going to analyze the ROC-AUC in the training and validation sets

Training set

Validation set

Determine the optimal threshold from AUC-ROC curve

7.4 Model performance

Now, we are going to calculate the different metrics and evaluate the performance of the model in train and validation sets

Observations

8 Model Performance Improvement

We are going to try different strategies to improve the model performance:

Early Stopping Callback

We will use early stopping callback in each model

8.1 Change learning rate

8.2 Add more hidden layers

8.3 Weight Initialization

8.4 Weighted loss to account for large class imbalance

9 Model performances on test set

Now, we are going to compare all models on test set

Observations

10 Conclusion and key takeaways